Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Automatic scene classification has applications ranging from urban planning to autonomous driving, yet little is known about how well these systems work across social differences. We investigate explicit and implicit biases in deep learning architectures, including deep convolutional neural networks (dCNNs) and multimodal large language models (MLLMs). We examined nearly one million images from user-submitted photographs and Airbnb listings from over 200 countries as well as all 3320 US counties. To isolate scene-specific biases, we ensured no people were in any of the photos. We found significant explicit socioeconomic biases across all models, including lower classification accuracy, higher classification uncertainty, and increased tendencies to assign labels that could be offensive when applied to homes (e.g., “slum”) in images from homes with lower socioeconomic status. We also found significant implicit biases, with pictures from lower socioeconomic conditions more aligned with word embeddings from negative concepts. All trends were consistent across countries and within the diverse economic and racial landscapes of the United States. This research thus demonstrates a novel bias in computer vision, emphasizing the need for more inclusive and representative training datasets.more » « lessFree, publicly-accessible full text available December 1, 2026
-
The same object can be described at multiple levels of abstraction (“parka”, “coat”, “clothing”), yet human observers consistently name objects at a mid-level of specificity known as the basic level. Little is known about the temporal dynamics involved in retrieving neural representations that prioritize the basic level, nor how these dynamics change with evolving task demands. In this study, observers viewed 1080 objects arranged in a three-tier category taxonomy while 64-channel EEG was recorded. Observers performed a categorical one-back task in different recording sessions on the basic or subordinate levels. We used time-resolved multiple regression to assess the utility of superordinate-, basic-, and subordinate-level categories across the scalp. We found robust use of basic-level category information starting at about 50 ms after stimulus onset and moving from posterior electrodes (149 ms) through lateral (261 ms) to anterior sites (332 ms). Task differences were not evident in the first 200 ms of processing but were observed between 200–300 ms after stimulus presentation. Together, this work demonstrates that the object category representations prioritize the basic level and do so relatively early, congruent with results that show that basic-level categorization is an automatic and obligatory process.more » « lessFree, publicly-accessible full text available January 3, 2026
-
Abstract Scene memory has known spatial biases. Boundary extension is a well-known bias whereby observers remember visual information beyond an image’s boundaries. While recent studies demonstrate that boundary contraction also reliably occurs based on intrinsic image properties, the specific properties that drive the effect are unknown. This study assesses the extent to which scene memory might have a fixed capacity for information. We assessed both visual and semantic information in a scene database using techniques from image processing and natural language processing, respectively. We then assessed how both types of information predicted memory errors for scene boundaries using a standard rapid serial visual presentation (RSVP) forced error paradigm. A linear regression model indicated that memories for scene boundaries were significantly predicted by semantic, but not visual, information and that this effect persisted when scene depth was considered. Boundary extension was observed for images with low semantic information, and contraction was observed for images with high semantic information. This suggests a cognitive process that normalizes the amount of semantic information held in memory.more » « less
-
Schyns, Philippe George (Ed.)A number of neuroimaging techniques have been employed to understand how visual information is transformed along the visual pathway. Although each technique has spatial and temporal limitations, they can each provide important insights into the visual code. While the BOLD signal of fMRI can be quite informative, the visual code is not static and this can be obscured by fMRI’s poor temporal resolution. In this study, we leveraged the high temporal resolution of EEG to develop an encoding technique based on the distribution of responses generated by a population of real-world scenes. This approach maps neural signals to each pixel within a given image and reveals location-specific transformations of the visual code, providing a spatiotemporal signature for the image at each electrode. Our analyses of the mapping results revealed that scenes undergo a series of nonuniform transformations that prioritize different spatial frequencies at different regions of scenes over time. This mapping technique offers a potential avenue for future studies to explore how dynamic feedforward and recurrent processes inform and refine high-level representations of our visual world.more » « less
-
Human scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we used a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization. Participants (both sexes) viewed 2250 full-color scene images drawn from 30 different scene categories while having their brain activity measured through 256-channel EEG. We examined the variance explained at each electrode and time point of visual event-related potential (vERP) data from nine different whitened encoding models. These ranged from low-level features obtained from filter outputs to high-level conceptual features requiring human annotation. The amount of category information in the vERPs was assessed through multivariate decoding methods. Behavioral similarity measures were obtained in separate crowdsourced experiments. We found that all nine models together contributed 78% of the variance of human scene similarity assessments and were within the noise ceiling of the vERP data. Low-level models explained earlier vERP variability (88 ms after image onset), whereas high-level models explained later variance (169 ms). Critically, only high-level models shared vERP variability with behavior. Together, these results suggest that scene categorization is primarily a high-level process, but reliant on previously extracted low-level features.more » « less
-
Visual scene category representations emerge very rapidly, yet the computational transformations that enable such invariant categorizations remain elusive. Deep convolutional neural networks (CNNs) perform visual categorization at near human-level accuracy using a feedforward architecture, providing neuroscientists with the opportunity to assess one successful series of representational transformations that enable categorization in silico. The goal of the current study is to assess the extent to which sequential scene category representations built by a CNN map onto those built in the human brain as assessed by high-density, time-resolved event-related potentials (ERPs). We found correspondence both over time and across the scalp: earlier (0–200 ms) ERP activity was best explained by early CNN layers at all electrodes. Although later activity at most electrode sites corresponded to earlier CNN layers, activity in right occipito-temporal electrodes was best explained by the later, fully-connected layers of the CNN around 225 ms post-stimulus, along with similar patterns in frontal electrodes. Taken together, these results suggest that the emergence of scene category representations develop through a dynamic interplay between early activity over occipital electrodes as well as later activity over temporal and frontal electrodes.more » « less
-
Human scene categorization is rapid and robust, but we have little understanding of how individual features contribute to categorization, nor the time scale of their contribution. This issue is compounded by the non- independence of the many candidate features. Here, we used singular value decomposition to orthogonalize 11 different scene descriptors that included both visual and semantic features. Using high-density EEG and regression analyses, we observed that most explained variability was carried by a late layer of a deep convolutional neural network, as well as a model of a scene’s functions given by the American Time Use Survey. Furthermore, features that explained more variance also tended to explain earlier variance. These results extend previous large-scale behavioral results showing the importance of functional features for scene categorization. Furthermore, these results fail to support models of visual perception that are encapsulated from higher-level cognitive attributes.more » « less
An official website of the United States government

Full Text Available